Skip to content

Infrastructure changes preparing for explicit graph construction#1762

Open
Andy-Jost wants to merge 2 commits intoNVIDIA:mainfrom
Andy-Jost:graph-infra-groundwork
Open

Infrastructure changes preparing for explicit graph construction#1762
Andy-Jost wants to merge 2 commits intoNVIDIA:mainfrom
Andy-Jost:graph-infra-groundwork

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Mar 12, 2026

Summary

Groundwork for explicit CUDA graph construction (#1317).

Changes

  • HandleRegistry template: Maps raw CUDA handles (CUevent, CUkernel) back to
    their owning shared_ptr via weak_ptr, enabling reconstruction of Python
    objects from driver-returned handles.
  • EventBox metadata: Event properties (timing_disabled, busy_waited,
    ipc_enabled, device_id, context) stored in C++ alongside the CUevent handle,
    accessed via get_box() pointer arithmetic. Replaces cached Python-level fields.
  • Event/kernel reverse-lookup registries: HandleRegistry instantiations for
    events and kernels, with automatic registration/cleanup.
  • Event.from_handle(): Public API for creating non-owning Event objects from
    foreign CUevent handles.
  • Kernel reverse-lookup: Kernel.from_handle now uses the kernel registry
    with library-mismatch warning.
  • IPC cache refactor: Migrated to use HandleRegistry.
  • Package conversion: _graph.py_graph/__init__.py (rename only).

Test Coverage

  • New tests in test_module.py for Kernel.from_handle library-mismatch
    warning and foreign kernel handle wrapping.
  • All existing tests pass unchanged.

Related Work

…work

Rename cuda/core/_graph.py to cuda/core/_graph/__init__.py to create a
package that will house the explicit graph construction module alongside
the existing stream-capture-based implementation.

Ref: NVIDIA#1317
Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Mar 12, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements cuda.core Everything related to the cuda.core module labels Mar 12, 2026
@Andy-Jost Andy-Jost self-assigned this Mar 12, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test b5f9970

@Andy-Jost Andy-Jost changed the title Add RAII infrastructure for explicit graph construction Infrastructure change preparing for explicit graph construction Mar 12, 2026
@Andy-Jost Andy-Jost changed the title Infrastructure change preparing for explicit graph construction Infrastructure changes preparing for explicit graph construction Mar 12, 2026
@github-actions
Copy link

Phase 1 groundwork for explicit CUDA graph construction (issue NVIDIA#1317):

- Add HandleRegistry template for reverse-lookup of CUDA handles back to
  their owning shared_ptr (via weak_ptr), enabling reconstruction of
  Python objects from driver-returned handles.

- Extend EventBox with metadata fields (timing_disabled, busy_waited,
  ipc_enabled, device_id, context) accessed via get_box() pointer
  arithmetic, replacing cached Python-level fields.

- Add event and kernel reverse-lookup registries for handle recovery.

- Add Event.from_handle() and Kernel reverse-lookup integration with
  library-mismatch warning.

- Convert _graph.py to _graph/ package (rename only, no content changes).

Closes NVIDIA#1317 (partial)

Made-with: Cursor
@Andy-Jost Andy-Jost force-pushed the graph-infra-groundwork branch from b5f9970 to 52d6f75 Compare March 13, 2026 19:02
@Andy-Jost
Copy link
Contributor Author

/ok to test 52d6f75

using MapType = std::unordered_map<Key, std::weak_ptr<typename Handle::element_type>, Hash>;

void register_handle(const Key& key, const Handle& h) {
std::lock_guard<std::mutex> lock(mutex_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: CTAD not available on MSVC compiler.

@Andy-Jost Andy-Jost removed request for mdboom and rwgk March 13, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant